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We consider the problem of estimation of a shift parameter of an 
unknown symmetric function in Gaussian white noise. We introduce a 
notion of semiparametric second-order efflciency and propose estima- 
tors that are semiparametrically efficient and second-order efficient in 
our model. These estimators are of a penalized maximum likelihood 
type with an appropriately chosen penalty. We argue that second- 
order efficiency is crucial in semiparametric problems since only the 
second-order terms in asymptotic expansion for the risk account for 
the behavior of the "nonparametric component" of a semiparametric 
procedure, and they are not dramatically smaller than the first-order 
terms. 

1. Introduction. Semiparametric statistical models are the ones contain- 
ing a finite-dimensional parameter of interest 6 and an infinite-dimensional 
nuisance parameter / which is a member of some large functional class. The 
goal is then to estimate 9 efficiently without knowing /. A comprehensive 
account of the theory of semiparametric estimation is given in the book 
of Bickel, Klaassen, Ritov and Wellner [3]. In particular, it is shown that 
for many semiparametric models there exist estimators attaining the same 
asymptotic performance as efficient parametric estimators constructed for 
the problem where / is completely specified. In other words, for such semi- 
parametric models there is no loss of efficiency as compared to the corre- 
sponding parametric models with known /. These semiparametric models 
are usually called adaptive, but we prefer here to call them S- adaptive, or 
semiparametrically adaptive, in order to avoid confusion with nonparamet- 
ric adaptivity to unknown smoothness of /. Estimators attaining parametric 
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efficiency in S'-adaptive models will be called S-adaptive (or efficient) estima- 
tors. Here and in what follows efficiency is understood in a local asymptotic 
minimax sense. 

There exist various methods of constructing S'-adaptive estimators. A gen- 
eral feature of these methods is that they proceed by "eliminating" the non- 
parametric component /, thus reducing the original semiparametric prob- 
lem to a suitably chosen parametric one. The most common approach is 
to specify a least favorable parametric submodel of the full semiparametric 
model, locally in a neighborhood of /, and to estimate 6 in such a submodel 
([3, 22, 24, 30, 31, 32] and the references cited therein). Least favorable 
parametric submodels turn out to depend on / only via a score function. 
"Elimination" of / under this approach means to estimate nonparametri- 
cally the efficient score function. Resulting estimators of 6 are often defined 
via one-step procedures that involve preliminary estimators of 9 and non- 
parametric estimators of the efficient score function. We note here, in con- 
nection with the discussion that follows below, that results on efficiency and 
(S-adaptivity are not very sensitive to the choice of preliminary nonparamet- 
ric estimates of the efficient score function. For example, kernel, orthogonal 
series, nonparametric maximum likelihood and other estimates can be used, 
under rather wide assumptions on their parameters, such as kernels, band- 
widths, etc. The important question of how to choose these parameters in 
practice is left open. Among other approaches that allow one to eliminate / 
efficiently we mention profile likelihood techniques [25] and invariance-based 
inference [13]. 

Thus, for a variety of semiparametric models, the statistician actually has 
an entire library of efficient (S-adaptive) estimators of 6. Which estimator 
is the best one? The theory discussed above does not answer this question 
because it deals only with the first-order asymptotics, which is the same for 
all S-adaptive estimators in a given model. Distinguishing between these es- 
timators is possible on the basis of higher-order asymptotics. This motivates 
us to study here second-order asymptotics and second-order semiparametric 
efficiency. We would like to emphasize that a study of second-order effects 
is more important for semiparametric models than for purely parametric 
ones and it is crucial for practical implementation, at least for the following 
reasons. 

• This is a compelling way to distinguish between various efficient semipara- 
metric methods and to choose the best among them. More specifically, it 
allows one to choose optimally the smoothing parameters that define the 
"nonparametric component" of a given family of efficient semiparametric 
procedures. 

• Second-order terms in asymptotics for semiparametric estimators are not 
dramatically smaller than the first-order terms; they might be in fact 
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quite comparable to each other for moderate sample sizes. Second-order 
terms depend on the smoothness of /. For example, in a typical case 
of twice differentiable / we get second-order terms ~ n~'^^^^ , the first- 
order asymptotics being as usual n~^/^, where n is the sample size. This 
differs from the purely parametric situation where the second-order terms 
decrease as (cf. [20]). 

Whereas first-order efficiency considerations for semiparametric models 
are essentially of a parametric flavor, the second-order ones come from 
nonparametric function estimation. Therefore, it is not surprising that the 
importance of second-order semiparametric asymptotics was first realized 
in the literature on nonparametric smoothing. Hardle and Tsybakov [15] 
pointed out that, in the single index model, the second-order term of the risk 
of the average derivative estimator is not significantly smaller than the first- 
order one and suggested choosing the optimal bandwidth by minimizing an 
asymptotic approximation of the second-order term. Mammen and Park [21] 
proceeded in a similar way to derive the optimal bandwidth for estimation of 
the efficient score function in the symmetric location problem. These papers 
considered specific families of estimators and did not deal with second-order 
efficiency among all estimators. Golubev and Hardle [9, 10] studied partial 
linear models and suggested second-order efficient estimators as well as their 
nonparametrically adaptive versions. These results rely strongly on the lin- 
earity and additivity of the parametric component in partial linear models. 
The problem of how to treat second-order efficiency for essentially nonlinear 
models has remained open, and our aim here is to give a solution to this 
problem. 

We restrict our study to one basic model that seems to capture the main 
difficulties in deriving second-order efficiency, being at the same time simple 
enough to avoid unnecessary technicalities. Namely, we consider the estima- 
tion of a shift parameter 6 based on observations 

(1) x'it) = f{t-e)+enit), tG [-1/2,1/2], 

where n{t) is the standard Gaussian white noise process on [—1/2, 1/2] (cf. 
[16], Chapter 3) and /(•) is a smooth symmetric [i.e., f{t) = /(— t), Vt] peri- 
odic function with period 1, and < e < 1 is a known noise parameter. With 
e = l/x/ri) where n is an equivalent sample size, model (1) can be viewed as 
a "Gaussian white noise analog" of the classical symmetric location model 
[2, 26, 27]. 

If the signal / is known, the maximum likelihood estimator 

/•1/2 

^ML = argmax / f{t — T)x^{t) dt 

r J-1/2 
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is locally asymptotically minimax (e.g., [17]). In particular, its mean square 
risk satisfies 

(2) limsnpEej[{eML-Ofl%f)] = l, 

for any sufficiently small interval G, where 

is the Fisher information associated with model (1) and Egj is the expec- 
tation with respect to the distribution of the observation X"^ = {x^{t),t G 
[—1/2,1/2]} in model (1). The corresponding probability measure will be 
denoted by Pej- 

In a semiparametric setup where / is not known, an efficient and S- 
adaptive estimator of 9 is suggested by Golubev [8] for a model close to 
(1) where the observations are available for all t G R and / is not periodic. 
Hardle and Marron [14] discussed semiparametric estimation for models with 
discrete observations similar to (1) involving also a scale parameter. 

Here we construct an 5-adaptive and second-order efficient semiparamet- 
ric estimator of 9 in model (1). It is of penalized maximum likelihood type 
with an appropriately chosen penalty. To derive this estimator, we introduce 
a prior on / and then maximize both in 9 and / the posterior density of 
/ given the observations. This procedure is of a Bayesian type w.r.t. / for 
fixed 6. It can be viewed in the following way: we "eliminate" the nonpara- 
metric component using a Bayesian argument, while the final estimation of 
6 is realized by maximum likelihood. 

We conjecture that the penalized maximum likelihood approach using 
similar arguments would be a proper tool to get second-order efficient esti- 
mators for other semiparametric models, and we believe that our technique 
of proving minimax lower bounds with second-order terms might be useful 
there as well. 

This paper is organized as follows. In Section 2 we give some heuristics 
concerning the first- and second-order efficiency in model (1). Section 3 con- 
tains the argument leading to a class of estimators defined by a sequence of 
weights: we show how these estimators (that are of penalized maximum like- 
lihood type) are derived from Bayesian considerations. In Section 4 we show 
that, under certain assumptions on the sequence of weights, the estimators 
from this class are 5-adaptive and we study their second-order asymptotics. 
Section 5 discusses a minimax problem for the second-order term. In Sec- 
tion 6 we give a locally asymptotically minimax lower bound and suggest a 
second-order efficient estimator obtained with a particular choice of weights. 
Sections 7-9 contain the proofs. 
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2. Some heuristics. This section provides some useful heuristics about 
first- and second-order semiparametric efficiency in model (1). 

We first explain the result (2) obtained for known /. An intuitive way to 
do this is based on a local linear approximation of the signal f{t — 9). Suppose 
that belongs to a small interval [Oq — A^, + ^e] > where Ag > and are 
known and A^ ^0 as e ^ 0. This assumption is essentially equivalent to the 
existence of a Ag-consistent estimator of 6. For simplicity, we assume that 
Ag ~ e [for rigorous proofs one needs to take A^ slightly larger than e, so 
that Ag/e ^ oo, as e ^ 0, e.g., A^ = e^/log{e~'^)]. Then, replacing f{t — 6) 
in (1) by its linear approximation f{t — Oq) — f'{t — 9q){9 — 9q), we get the 
linear model 

(3) xl{t) = f{t-9o)-nt-9o){9-9o)+en{t), t G [-1/2, 1/2]. 

When / is known we can subtract f{t — 9q) from these observations, thus 
obtaining an equivalent model, 

f{t) = f'it- 9o){9 - 9o) + en(t), t E [-1/2, 1/2]. 

Estimation of 9 — 9q in this linear regression model is now straightforward. 
Multiplying the observation y^{t) by f'{t — 9o), integrating over the interval 
[—1/2, 1/2] and dividing by P{f) we get the Gaussian shift model 

(4) Y' = 0-9, + [P{f)]-y^^, 

where ~ A^(0,1). Clearly, is an efficient estimator of 9 — 9q. Thus, 
the argument here is based on replacing the original nonlinear estimation 
problem by a Gaussian shift experiment. A deep theoretical background for 
this argument is given by Le Cam's theory of asymptotic equivalence [19]. 

Suppose now that / is an unknown symmetric function. Then again we can 
use model (3) to approximate the initial model (1). But the approximating 
model is now nonlinear since it contains the product of unknown parameters 
{9 — 9q) and f'{t — 9q). Fortunately, this is not a problem, and in this case 
one can also construct an efficient estimator. 

Indeed, since /' is an odd function and / is an even function, projecting 
the observations (3) on the spaces of even and odd functions we get 

(5) xl{t) = fit- 9o) + elicit), 

(6) xl{t) = f'{t-9o){9-9o) + e7io{t), 

where no{t) and ne{t) are two independent Gaussian white noise processes. 
Based on a;|(t), we estimate the derivative f'{t — 9o) and then plug this 
estimator into (6) to recover the parameter of interest from the observation 
x'^{t). This allows us to obtain an efficient (5-adaptive) estimator of 9. 



6 



A. S. DALALYAN, G. K. GOLUBEV AND A.B. TSYBAKOV 



We turn now to a heuristic derivation of second-order asymptotics. In 
order to do that we simphfy our approximate statistical model (5)-(6) as- 
suming that ^0 = and translating the observations x^{t),xl{t) in a sequence 
space. 

We will suppose throughout the paper that the unknown function / can 
be represented as 

oo 

(7) f{t) = V2j2fkCosi27Tkt), 

k=l 

where the Fourier series converges for all t and the Fourier coefficients fk 
are defined by 

= V2 / f{t) cos{2TTkt) dt. 

J-l/2 

Using this and projecting (5) and (6) on the trigonometric basis functions 
we obtain the sequence model 

(8) Xk = fk + £Ck, k = l,2,..., 

(9) Xl = e{27rk)fk + eek, k = l,2,..., 

where (^fc)Cfc) A; = 1,2,...) are i.i.d. standard Gaussian random variables. 
The nuisance parameters fk can be estimated from (8) by well-known tech- 
niques for the Gaussian sequence model (see, e.g., [29]). In particular, it is 
natural to use linear estimators of fk defined by fk = hkXk, where hk = hk{£) 
are such that Y^'kLih\ < oo. An example is hk = l{fc<Ar5} where Ij.} is the 
indicator function and is an integer such that A'g ^ oo as e — > 0. 

Next, considering separately model (9), it is not hard to show that if fk 
were known the maximum likelihood (least squares) estimator 

oo oo 

(10) ef = Y,{2T,k)fkXl/Y.{2'nkffl 

k=l k=l 

would be asymptotically minimax for 9. At first sight, it seems natural to 
plug in fk instead of fk in the expression for 6 f , thus obtaining the estimator 

oo oo 

(11) 9 = Y.{27rk)hkXkX*k/Y.i'^7rkfhlxl 

k=l k=l 

However, this estimator is not optimal: it can have a very large bias. The 
reason is that the functional J2'k^=i{'^'^f^)'^ fk ™ (10) is not estimated correctly. 
An improved version of 9 can be suggested in the form 



oo oo 

(12) 9* = J2i'^7rk)hkXkX*k /Y.{2T:kfhk{Xl - e^). 

k=l k=l 
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As compared to (11), we replace hf. by in the denominator and replace 
by the unbiased estimator X'^ — of /|. This turns out to improve 
significantly the asymptotics of the risk. 

We now give a heuristic analysis of the risk of Q* . Using (8)-(9) and the 
notation \\ff = e^l^{f) = Y.t=\{^'^^f fh obtain 



(13) (^*-^)V^^(/)-ii/'iiEt.(2™Krr 



where 



k=l k=l 

oo oo 

Tl =eJ2{27rk)\kfk^k + 0eY.i27rkfhkiek " 1), 

k=l k=l 

oo oo 

r| = 2eY,i27rkfhkfk^k + e^J^i^nkf^kiek - 1)- 



k=l k=l 

In order to simplify the expression in (13) we assume that X]fcLi(2vrA;)^/| < 
oo and that /i^ are chosen so that eX]fc^i(27rA;)^/ifc < oo. Under these condi- 
tions, using \6\ < Ag ~ e, one obtains that 

It is also straightforward to see that Egj{x^T\) = 0, and to show, with some 
easy algebra, that 

oo oo 

E.,/[(x^)'r|] = 4e' hl{27Tk)'fi + 2e' ^ hl{2^kf = 0{e'). 

k=l k=l 

Next note that we are allowed to drop the terms of order O(e^) since their 
contribution in the risk (asymptotically, in the mean absolute value) is 
smaller than the final second-order asymptotics that we are going to ob- 
tain. Up to these terms, we get from (13) 

II »-ii r / oo \ -In 



Kk=l 



and thus 

Eej[{9*-efl^{f)] « \\f'f[Y.(^^k)'h,fi ] EojKx^'] 



oo 



k=l J 



OO 

k=l \k=l 



||/'f;^(2vrA:)2/^i(e2 + /|) 5:(2vrfc)2/^,/| 
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This expression can be simplified if we assume tliat <hk <! and 

2 / rv~) \ 



(14) 



Y,{l-hk){27rkffi 

.k=l 



\k=l / 



as e — > 0. Tfien, in particular, ~ ^k){2'n'k) = o(l), and one obtains 



\k=l 



c'\\2 



+ Y,{27rk)\hk-l)f, 



k=l 



l-2\\f'\\-'J2{2nk)\hk-l)f, 



k=l 



Using this and (14) we derive the following expansion for the risk: 



1 + 



J^{27rk)\e'hl + {hl-l)f, 



\k=l 



(15) 

where 
(16) 



1-21 



J2{27rkfihk-l)l 



k=l 



1 + 



R^[f,h] = J2i27rkn{l - h,ffl + e^hl]. 



k=l 



The second-order term in (15), that is, the functional /i], has a 

clear statistical meaning. Suppose that we know 9 and we want to estimate 
the derivative f'{t — 0) based on observations (1). To measure the quality of 
an estimator f'[t — 9) we choose the relative mean integrated squared error. 



Err(/',/') 



fl\\2 



1/2 
-1/2^ 



[f\t-9)-nt-9)fdt. 



Consider a linear estimator 

oo 



f'(t _ 61) = -2 y hk(2TTk) sin[27r/c(t - 9)] / cos[27r/c(t - 9)]x''(t) dt. 



Using (7), it is easy to show that Err(/^,/') = \\f'\\^'^R'[f,h]. Thus, the 
expression /i] is a relative mean integrated squared error for 

nonparametric estimation of the derivative of / in the Gaussian white noise 
model. We see that the second-order expansion (15) relates two statistical 
problems: semiparametric estimation of 9 and nonparametric estimation in 
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L2-norm of the Fisher informant f'{t — 9). It also reveals a presumably 
general fact that second-order asymptotic terms in semiparametric problems 
account for the mean integrated squared error of recovering of the Fisher 
informant. 

3. Penalized maximum likelihood estimator. In Section 2 we have 
sketched second-order asymptotics for the estimator 9* in model (8)-(9), 
which is only a local approximation of the original model (1) in a neighbor- 
hood of ^0 = 0. Thus, 9* is not directly applicable for model (1). Of course, 
the procedure can be corrected: instead of replacing by 0, one should 
replace it by a preliminary e-consistent estimator of 9. This would lead to 
a two-stage estimation procedure that would presumably have the desired 
second-order behavior under some conditions. There exists, however, a di- 
rect and more elegant estimator achieving the same result. This estimator 
is inspired by the Bayes argument that we are going to describe now. 

Given model (1), we have at our disposition the following series of discrete 
observations: 



Here (CfciCli A; = 1, 2, ... ) are i.i.d. standard Gaussian random variables. 



and (17) is obtained by projection of (1) on the trigonometric basis functions 
on [-1/2,1/2] using (7). 

Our aim is to define a suitable estimator of 9 using these observations. 
A general idea is to "eliminate" first the nonparametric component of the 
model represented by the sequence of Fourier coefficients fk (which we con- 
sider to be nuisance parameters). We will proceed as follows. Assume for 
a moment that the /^'s are independent zero- mean Gaussian random vari- 
ables with variances a^. . Assume also that they are independent of the noise 
sequence {ik-,ik\- ^^^^ replace the sequence {f^} by the most probable, 
with respect to the posterior distribution of {fk} given sequence 
{fk}- Clearly, this sequence will depend only on and 6*, and thus 

{fk} will be eliminated. The final step will be to maximize over 9 the re- 
maining likelihood, thus obtaining an estimator of 9. 

To define the procedure formally, note that the problem factorizes: it is 
sufficient to find /^'s for a fixed k, since the triples Xk, x'^, fk with different k 
are independent. Maximizing over the posterior density of fk given Xk,xl 



Xk = fk cos(27r/c6') + e£,k 



(17) 



xl = fksm{2TTk9)+eek 



k = l,2,.... 
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is equivalent to maximizing the joint density of Xk,xl,fk, which equals 



pe{xk,xl,fk) 



1 



X exp 



(xk - fk cos(27rfc0))2 + (4 - fk sm{2TTke)f 



^{Xkt Xk) 

-V2fk /-i/s 



X exp 



1/2 



2^2 



cos[27rA:(t - e)]x''{t)dt 



where A{xk,xl,) does not depend on fk and 9. The maximizer ofpe{xk,x*f,, fk) 
over fk has the form 



f*{9) = y^Xk / cos[27rA:(t - 9)]x'it)dt, 

J -1/2 



where At 



(18) 



and 



mayipe{xk,x*k,fk) 

Ik 

= Pe{xk,x*k,fk{9)) 



A{xk,x*k)exp 



Afc 



1/2 \ 2 

cos[2Trk{t - 9)]x%t) dt 

-1/2 



Set 



OpML = argmax TT pg{xk,xl, fk{9)) = argmax 



max 

k=i 



where is a parameter set associated with the model. Thus, ^pml is the 
^-component of the overall maximum likelihood estimator corresponding to 
the infinite product density Y{^=iPe{xk-,x\, fk)- In view of (18), we may write 
this estimator as 



(19) 6'pML = argmax < 



1/2 ^ 2~ 

cos[27rA;(t - T)]x^{t)dt 

-1/2 



or as 



(20) 



^PML = arg max max 

^66 {g„} 



V2. 



■ °o j-l/2 
y^gk COs[2TTk{t-T)]x%t)dt 



k=l 



2e2 + 2al 
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where max^g^y denotes the maximum over sequences {gk} belonging to a 
subset of £2, and we suppose that / satisfies conditions such that the infi- 
nite sums converge almost surely. We will call ^pml a penalized maximum 
likelihood estimator (PMLE), although this is not a PMLE in the usual 
sense. Comparing ^pml with the maximum likelihood estimator ^mLj we see 
that ^pML can be interpreted as a penalized version of ^ml corresponding to 
a function /(•) = /r(-) whose Fourier coefficients are the maximizers {^^(t)} 
of the term in square brackets in (20) over {gk} for fixed r and to the penalty 
J2T=i{9ki'^))'^ i'^ + (^P ^ multiplicative constant, cf. definition of 

^ml)- Thus, the difference of ^pml from the "pure" PMLE is in the fact 
that /(•) = /r(-) is not fixed and known: it depends on the parameter r over 
which the maximization is carried out. 

To make the estimator ^pml feasible, it is natural to consider only finite 
sums in (19), (20), including the terms with k < Ng,, for some Nf, that de- 
pends on e and tends to 00 as e — > 0. In particular, this will be the case for 
the second-order minimax estimator that we derive below. 

Note that the estimator (12) defined in Section 2 is nothing but a local 
version of the estimator (19) in a neighborhood of = 0. In fact, differenti- 
ating formally the expression in curly brackets in (19) we obtain that ^pml 
is a solution of the equation 



(21) 



^\k{2'Kk)(^j cos[2TTk{t - t)]x^ {t) dt 
X (^j sm[2Tik{t-T)]x^{t)dt^ = 0. 



The integrals in (21) are equal to yk = Xk cos(27r/cr) -|- sin(27r/cr) and y'^ = 
rr^ cos(27r/cr) — sin(27r/cr), respectively, allowing one to reduce (21) to 

00 

^ Afc(27r/c){xfcx;:cos(47r/cr) - [x^ - (x^)^] sin(47r/cr)/2} = 0. 

k=l 

Linearizing this equation in the vicinity of r = 0, we get the following ap- 
proximate formula for a solution of (21): 
00 00 
(22) W ~ ^(2^A:)AfcXfcX^/ ^(27rA:)2Afc[xi - (x^']- 

fc=l k=l 

It can be shown, using the argument from Section 2 that (22) is asymptoti- 
cally analogous to the estimator 0* given by (12) with hk = Xk- One differ- 
ence is that here we have x^ , x^ instead of Xk , , but Xk w Xk and x^ « 
for 6 close to 0. Another point is that these estimators have somewhat dif- 
ferent denominators. However, for small 9 both denominators estimate the 
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same quadratic functional J2T=ii'^'^^)'^ fk ^^'^ show that they are 

quite close to each other, so that their difference does not appear in the 
second-order asymptotics of the risk. 

4. Second-order asymptotics of the estimators. In this section we con- 
sider the class of estimators defined by 

(23) ^AD = argmax| V/ifcf / ^ cos[27t k {t - t)]x'' (t) dt] 1, 

where {h^} is a sequence of real numbers satisfying some general conditions. 
For a particular choice = \k the estimator 0ad is equal to the penalized 
maximum likelihood estimator (19) obtained from a Bayesian argument with 
Afc = Cfc/(ffc + ^^)) but we also allow other weights h^. In particular, the 
weights {/ifc} such that /ifc = 1 for some initial values of k play an important 
role in our further argument, while we always have < 1 for 6'pml- 

We will show that under some assumptions on {hk} the estimator ^ad is 
(S-adaptive and we will give explicit second-order asymptotics for the risk of 
^AD- In what follows we will suppose that h^^O for only a finite (typically, 
depending on e and growing to oo, as e — > 0) number of integers k. This 
assumption is natural, since otherwise the estimator ^ad is not feasible. In 
order not to specify the set where ^0 we keep in the notation the sums 
over all integers k. 

We first define the parametric set where 9 lies. Since / is symmetric 
and periodic with period 1, we get that s{t) = f (1/2 — t) is also symmetric 
and periodic with period 1. Hence, the observations x^{t) corresponding to 
parameters (^,/(-)) and {9 — 1/2, s(-)) have the same probability distribu- 
tion. So we cannot discriminate between values 9,9 + 1/2, 9 + 1, . . . in model 
(1) if we suppose that / belongs to the class of symmetric and periodic 
functions with period 1. In order that the model be identifiable, Q should 
be strictly included in an interval of length 1/2. For definiteness, we assume 
the following. 

Assumption A1. Q = {9:\9\< tq} where < tq < 1/4. 

Next, we define the class of functions F where / lies. Let p and Cq be positive 
constants. Denote by i*" = F{p, Cq) the class of all functions / : [—1/2, 1/2] — > 
R that admit the Fourier expansion (7) with coefficients fk satisfying the 
following assumptions. 

Assumption A2. > p. 

Assumption A3. < Cq. 
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Here and in the sequel, for a sequence of real numbers {uk}, we use the 
notation 

oo oo oo 

\\af = ^al \\a'f = ^ al{2nkf , ||a"f = ^ al{2nk)\ 

k=l k=l k=l 

Assumptions A2 and A3 imply that 

(24) Co > ll/'f > (27r)V V/GF. 

Furthermore, we impose some conditions on the weight sequence {/ifc}, as- 
suming that it depends on e. 

Assumption B. The weight sequence {h^} is such that /ii = 1, < < 
1 for all k, and 

Bl. \\h'\\ > /9i log^(e~^) maxfc /ifc(27r/c), where pi > is a constant that does 
not depend on e, 

B2. £'^J2'k^=if^k{27rk)^ < Ci, where Ci is a constant that does not depend 
on e. 

We remark that the condition < /ifc < 1 here is quite natural: if ^ 
[0, 1], projecting hk on [0, 1] only improves the second-order asymptotics (cf. 
the expression for R^[f,h] in (16)). Note also that Assumption B2 and the 
fact that < /ifc < 1 imply the finiteness of \\h'\\ for any e. Assumptions 
Bl and B2 are not very restrictive. For example, consider the projection 
weights hk = t{k<Ne} where N^^ is an integer such that A^^ — > oo as e — > 0. 
Then Assumption Bl is equivalent to > Clog^(e~^) for some constant 
C > 0, and Assumption B2 is satisfied if = 0(e~^/^) as e ^ 0. 

Finally, we will need the following assumption involving both / and {h^}- 

Assumption C. The weight sequence {/ifc} is such that, uniformly in 



J2a-hk)i27rkffi 
.k=i 



o\^J2('^-hk)\27Tkffij as 8^0. 



Note that, again. Assumption C is quite mild. For the projection weights 
hk = l{fc<Ar^} it means that J2k>N^{'^'^^)'^ fk — > as e — > 0, uniformly in / G 
F, which is true due to Assumption A3. 

Theorem 1. Let Assumptions A1-A3, B and C be satisfied. Then, uni- 
formly in f £ F and in 9 £Q, 

Ee,/[(^AD - 0?I%f)] = ! + (! + as 8^0, 

where the functional R^[f,h] is defined in (16). 
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Proof of Theorem 1 is given in Section 7. 
Assumptions A3, B and C imply that 

supR'[f,h] = o{l) 

/6F 



as e 



0. 



In fact, it follows from Assumptions Bl and B2 that X^fcli ^fe(27r/c)^ = 
o(l), while Assumptions A3 and C yield Efcli(l - hkfi'^'^kffl = o{l) as 
e — > 0. Thus, Theorem 1 shows that ^ad has the same first-order asymptotics 
as the efficient estimator ^ml [cf. (2)], that is, ^ad is S'-adaptive under the 
assumptions of Theorem 1. But Theorem 1 says more than that, because it 
also provides an asymptotically exact second-order expansion for the risk of 



5. Minimax problem for second-order term. It follows from Theorem 1 
that the second-order term of the risk of ^ad depends on the coefficients {hk} 
only via the functional [/, /i] . We would like to make this term as small 
as possible by minimizing it over /i^. Since we do not know the nuisance 
parameters fk we consider a minimax setting: we look for h = {h^} that 
minimizes the maximum of the functional fi\ over a suitably chosen set 
of sequences {fk}- Namely, we consider a Sobolev ball 



where /? > 1 and L > are given constants. A minimax sequence of weights 
q = {qk} ^ ^2 is defined by 



It is well known (see, e.g., [1] or [23]) that such a sequence q exists and it 
has the form 




sup R'^[f,q]= ml sup R^[f,h]. 



(25) 




where x 



max(x,0) and is a solution of the equation 



(26) 




As e — > 0, we have 




L (/3 + 2)(2/3 + l) 
£2 (2^)2/3(/3- 1) 



) 



1/(2/3+1) 



(27) 
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Moreover, the functional R'^[f, h] has a saddle point on W{(3,L) x £2 (cf. [1], 
[23] or [29], Chapter 3) with components s, q, where s = {sk} is any sequence 
satisfying 



e 
1 



(28) sl = e'-^ = e' 

The existence of a saddle point at {s,q) means that 

inf sup [f,h]= sup inf R^f,h] = R^[s,q]. 
^6^2 /e>v{/3,L) /eW(/3,L) ^e^2 

Using (25), (26) and (28), the value R^[s,q] can be expressed explicitly, 
which yields 

(29) inf sup R'[f,h]= sup R'[f,q]=£^Y.('^^^)'^1k = r'. 

/ief2 /6>v(/3,L) /eW(/3,L) fc^i 

Note finally that, as e ^ 0, 

(2^)2(/3-l) . 3 



I- '''''' £- 



e'W,'{l + o{l)) 



3(/3 + 2) 
(30) 

= C*(/3,L)e(4/5-^)/(2/5+i)(l + o(l)), 



1/ V''-'^/^''^'\r... . .^^3/(2,+l) 



where 

The rate e(^/^~^)/(2/3+i) in (30) characterizes the ratio of second-order terms 
to first-order terms in the asymptotic expansion for the nonnormalized risk 
^ej[{Ge ~ G)'^]- This ratio is not dramatically small for /3 not too large; 
for example, it equals e^/^ for /? = 2. Thus, the second-order terms might 
be comparable with the first-order ones. In absolute value, the first-order 
term of nonnormalized risk decreases as and the second-order term as 

^(8/3-2)/{2/3+l) , 



6. Locally minimax lower bound and second-order efficiency. In this sec- 
tion we obtain a lower bound for the minimax risk and construct a second- 
order efficient estimator of 9. 

Let / be a fixed function from F(p,Cq) with the Fourier coefficients de- 
noted by fk- For S > define a vicinity of / by 

(31) Fs{f) = {f = f + v:\\v\\<6,veW{p,L)}. 

It is assumed that /? > 2. Recall that ||/"|| < 00 since / G F{p,C()) (cf. As- 
sumption A3). If 6 is small enough, Fs{f) C F{p', Cg) for some p' > 0, Cg > 
depending only on p,Co,L. 



16 A. S. DALALYAN, G. K. GOLUBEV AND A.B. TSYBAKOV 

Theorem 2. Let the real number 6 = 5^ be such that lim£_>o<^e = and 
liuii^^Q 5'^ / {e'^W^'^"') = oo for some a > 0, where satisfies (27). Then, as 

(32) inf sup Bgjm-efl^f)]>l + {l + o{l))^. 

de 9ee,feFs,{f) 11/11 

Here and in what follows infg (or inf^J is the infimum over all estimators 
based on the observation X"^, and r^ is the minimax value defined in (29). 

The proof of Theorem 2 is given in Section 8. 

Motivated by the above results, we introduce the following notion of semi- 
parametric second-order efficiency. 

Definition 1 . An estimator 0* is called second-order efficient at / G F 

if 

(33) sup Eej[(e:-0)2r(/)] = l + (l + o(l))^ as e ^ 0, 

deejGFs.if) 11/ II 

for some 6s > such that limg^o 5^ = 0. 

Comparing Theorems 1 and 2 we see that if there exists a sequence of 
weights hk = for which Assumptions B and C are satisfied and 

(34) sup R'[f,X*]<r'{l + oil)), 

where A* = {A^}, then the estimator ^ad with this choice of weights is 
second-order efficient. At first sight, it seems that one can take A^ = qk 
from (25). However, for hk = Qk Assumption C is not fulfilled. Therefore we 
correct Qk, taking 

k 



1 



k>JeWe, 



where We is a solution of (26) and 7^ = l/log(e~^). For k > jeWe, the 
weights A^ induce a prior on {fk} analogous to the one that appears in 
the proof of the lower bound of Theorem 2. The corresponding penalized 
maximum likelihood type estimator has the form 

(35) 6'pML = argmax<' V A^ ( / cos[27rA;(t - r)]x^(t) dt^ 



-66 U=l ^-^-1/2 
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Theorem 3. Let a function f £ F be such that, for some p> f3 > 1, 

oo 

(36) Y.(^7rkf'E<^ 

k=l 

and lime^o<^e = 0, Imi^^Q 5^ / {e^W}^'^) = oo, for some a > 0, where sat- 
isfies (27). Then, as e — > 0, the local asymptotic minimax risk admits the 
second-order expansion 

(37) inf sup Eejm-6fP{f)] = l + {l + o{l))^. 
Moreover, the estimator Op^^ defined in (35) is second-order efficient at f. 



The proof of Theorem 3 is given in Section 9. 



Remark 1. Theorems 2 and 3 are local in / and nonlocal in 6. Inspec- 
tion of the proofs shows that they can be turned into local ones in 6 as well, 
that is, that one can replace sup^gg by sup\g_g^\^^ where t > is a small 
number (fixed or tending to with e not too fast) and 9q is an interior point 

of e. 

Remark 2. In the argument of Section 3, Afc = o-|/((J^ + e^). The values 
(cj^)^ corresponding to for k > jeWe are thus 



Ate 



2 



£ 



1 



One can interpret these (c^)^ as variances of the prior distributions of the 
fk's introduced in Section 3. These variances appear also in the proof of the 
lower bound [cf. (46)]. The fact that the initial values of A^, are equal to 1 
means that we do not put any prior distribution on the Fourier coefficients 
fk for k < je'^e- Note that this is a particular choice of a prior associated 
with the Sobolev classes of functions. 



Remark 3. It is interesting to compare results on nonpar ametric and 
semiparametric second-order efficiency. Golubev and Levit [11, 12] and 
Dalalyan and Kutoyants [6] considered nonparametric problems where there 
exist -^/n-consistent first-order efficient estimators (such as estimation of the 
cumulative distribution function). In these problems there are simple effi- 
cient estimators, as the empirical c.d.f. and smoothed estimators allow one to 
improve upon these simple estimators, so that the second-order asymptotic 
terms are always negative. On the contrary, in semiparametric problems, as 
in the one considered here, simple empirical estimators are not efficient, and 
one has to use smoothing already to attain first-order efficiency. As we see 
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from Theorems 1-3 (cf. also Golubev and Hiirdle [9, 10], who studied partial 
linear models), in semiparametric problems second-order asymptotic terms 
are positive, so that they always spoil asymptotics. This suggests that the 
choice of correct smoothing that allows one to optimize second-order asymp- 
totic terms is more important in semiparametrics than in nonparametrics. 

7. Proof of Theorem 1. In what follows we use the same notation C for 
finite positive constants that may be different in different occasions and can 
depend only on tq, p,Co, pi and Ci. 

The first step of the proof of Theorem 1 is to show that estimator ^ad is 
e-consistent. 

7.1. Consistency of Oab- The estimator 9 ad is a maximizer of the con- 
trast function 

oo . .1/2 >.2 

L{t) = J2hk<^V2 J ^ cos[2TTk{t-T)]x'{t)dtj 

oo 

= J2 ^kifk cos[27r/c(r - 6)] + e^fc(O) cos(27rA:r) + eCk{0) sin(27rA:r))^ 

k=l 
oo 

= ^ hkfi cos''[27rk{T - 9)] + 2e\\f'\\vi{T) + e%{T), 

k=l 

where 9 is the true value of the parameter, 

»l/2 



^fc(n) = V2 / cos[27rA;(t - n)]?i(t) dt 

J-l/2 
r- /"l/^ 

^l{u) = V2 / sin[27r/c(t - u)]n{t) dt 

J -1/2 



and 



oo 

mir) = E /ifc/fcCos[27rA:(r - 0)](efc(O) cos(27rA;r) + ^^(0) sin(27rA;r)), 



k=l 

oo 



r?2(r) = hkMO) cos(27rfer) + ^^(0) sin(2^A:r))l 
k=l 

The following three lemmas allow us to control the first derivatives of ?/i(t) 
and r/2(T). 

Lemma 1. Uniformly in f & F we have 

P< sup |?/i(t)| > X > < ci exp(— C22;^) Vx > 0, 
Iree J 

where the constants ci > and C2 > depend only on p and Cq . 
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Proof. Note that r]'i{T) is a stationary Gaussian random process with 
mean and twice continuously differentiable correlation function r(-) such 
that r"(0) 7^0. It follows from the Rice formula ([18], Theorem 7.3.2, or [5], 
Chapter 13.5, page 294; see also Proposition 2 in [28]) that for all x > 0, 



(38) p(sup|r/;(T)| >xl <C 
Iree J 



(r"(0)/r(0))" + l 



exp 



2r2(0);' 



where C > is a universal constant. Now, since f £ F, 

oo 

r\0)=BW,irf] = \\fT'Ehlfi{2nkf > (2vr)2||/'||-V 



k=l 

oo 



(r"(0))2 = EWIirf] = WfT' E ^i/l(2vrfc)' < CoWfT', 

k=l 

which together with (38) proves the lemma. □ 

We will use the following simple fact about moderate deviations of the 
random variable: 

oo 
i=l 

where the ^j's are i.i.d. standard normal random variables and {ofc} is a 
sequence belonging to £2 , so that the random series converges almost surely. 

Lemma 2. Let at ^ 0, {afc} G ^2- For any < x < ||a||/maxfc |afc| we 
have 



P{|<^| > xy^E[?2] } < 2exp(-xVl6). 
This result follows, for example, from (27) of Lemma 2 in [4]. 
Lemma 3. For any < x < \\h'\\/ max^ hk{27rk) 

p| sup|772('r)| > 4 V /ifc(27rA:) +x||/i'|| I < 4exp(-C3X^), 

where C3 > is a universal constant. 

Proof. Using the Cauchy-Schwarz inequality we get 

00 

sup |r/^(r)| < 2 E /ifc(2vr/c) sup{|^fc(0) cos(27rA;r) + ^|(0) sin(27r/cr)| 
k=i 

X |-Cfc(0)sin(27r/cr) +^fc(0)cos(27rA;r)|} 

00 

<2E/ife(2vrA;)[ei(0)+ef(0)]. 

k=l 
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The rest follows from Lemma 2. □ 



Consider now the expectation of the contrast function L(-) 



oo 



nL{T)] = Y,hkficos^[27rk{T-d)]. 



Lemma 4. Let Assumptions A1-A3 and B be satisfied. Then 
B[L{T)]-E[L{e)]<-C\T-e\^ VtGG, 



where the constant C > depends only on tq, p and Cq . 

Proof. The derivatives of the function G(r) =E[L(t)] satisfy G'{9) = 
0, G"{e) = -2^^i hkfi{2TTkf < -2(27r)V- Thus, the assertion of the lemma 
holds for r in some neighborhood of 9. Since also E[L(t)] < E[L{9)] for all 
r G 0, T ^ 9, and 6 is a bounded interval (cf. Assumption Al), the lemma 
follows. □ 

Now we are ready to show that ^ad is e-consistent. 

Lemma 5. Let Assumptions A1-A3 and B be satisfied. Then, uniformly 
in f £ F and in 9 gQ, 



for all X £ [xq, \\h'\\/ max^ /ifc(27rfe)], where C4 > 0, C5 > 0, > are constants 
depending only on tq, p, Cq, Ci. 

Proof. Due to Lemma 4 we have 



Pe,/{ I^AD - ^1 ^HF) >x}<Ci exp(-C5x2) 



Vej{\9Ai,-9\^P{f)>x} 





+ 2e||/'||(r?i(r)-r/i(0)) 
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+ |r-^|(2e||/'||max|7?Ui)l 

+ e^max|?7n(i)| 
tee ' ^' 



> 



< P< max \m (t)\ + emax IrinitM > Cx 
~ t tee ' " tee ' '^^ ~ 

emax |?72(^)l — C*^ f • 

The first probability on the last line is controlled by Lemma 1, whereas the 
second probability can be bounded, in view of Lemma 3, by 4exp(— Cx^), 
since according to Assumption B2 one has eX^fcLi hk{2iTk) < C", £\\h'\\ <C', 
where C depends only on Ci, and thus Cx > 4eX]fc^i hk{2-Kk) +cex\\h'\\ for 
any x> xq ii xq is large enough and c > is small enough. □ 

7.2. Proof of Theorem 1. Let us introduce the event Ai = {|^ad — ^| < 
C6e-\/log(e~^) } where ce > is a sufficiently large constant that can depend 
only on tq, p, Cq and Ci. The risk of ^ad can be decomposed into two terms, 

(39) Ee,/[(^AD - Of] = BejiiOAD - OfuA + Ee,/[(^AD - OfUi]. 
Using (24) and Lemma 5 we find that, for cg large enough, 

(40) Be,f[ieAD-efP{f)lAi]<Ce-''FeA^l} = 0{£^) ase^O. 



Indeed, for x = e^/W{f)]og{e^) > C-\/log(e~^), due to Assumption Bl and 
(24), one has 

xmaxfe/ifc(27r/c) yCblog-2/2(e-2) 

1 I T /II — ^ ^ ^ ^ ^' 

\\h'\\ pi 

Thus we can apply Lemma 5, which yields (40) when cg is large enough. It 
remains to find the asymptotics of the first term on the right-hand side of 
(39). The estimator ^ad satisfies 

(41) L'(^ad) = 0. 

Using Taylor approximation of the left-hand side of (41) in a neighborhood 
of 9 we may write, for some uj gQ, 

(42) Lo{e) + {6- eAD)Li{e) + ^{e- eAi,fL2{uj) = o, 

where 

oo oo 
k=l k=l 
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k=l 



Lo(uj) 



k=l 



1/2 
-1/2 

sm[2Trk(t - uj)]x^t) dt 



nl/2 
1/2 



Lemma 6. Let Assumptions A1-A3 and B be satisfied. Then 
sup Bej[{Li{e)-Eej[Lii9)])^] = 0ie^) ase^O, 



and 



sup Eg J sup|L2(u;) 



<C. 



Proof. We omit the proof of the first relation since it follows from 
simple algebra. To prove the second one, using trigonometric formulae and 
the Cauchy-Schwarz inequality, we write 



V2 



cos[2-Kk{t - uj)]x%t) dt 

■ \fkCos[2TTk{d -Lo)]+e^k{0)cos[2TTkuj]+eCk{0) sin[27rA:u;]| 



<\fk\+eJuoy+ekior- 



Similarly, ^/2\J sm[2TT k {t - uj)]x'' {t) dt\ < |/fc| + eA/Cfc(0)2 + ^1,(0)2. Therefore 



\L2{u^)\ <cY^ hk{2iTkffi + ^ hk{2^kf[em + ifm. 

k=l k=l 

The second inequality of the lemma follows easily from this and Assumptions 
A3 and B2. □ 

To analyze the behavior of ^ad we compare it to the root f of the linear 
equation 

(43) Lo(^) + (0-f)Eej[Li(0)]=O 

representing an approximation of (42). 



Lemma 7. Let Assumptions A1-A3, B and C he satisfied. Then 

m\f h] 

Bej[{f-e)'L%f)] = l + {l + o{l)y- 



II/'P ' 

where o(l) uniformly in f £ F and in £ Q, as e ^ 0. 
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Proof. Using the inequality (1 - hl)f^{27rkf < 2(1 - hk)fl{2'Kkf, As- 
sumption C and (24), we get from (43), 

_ ^+\\n-^i:T=im-i)fi+e^hi]{2^kf 
[i+\\n-'Y.v=i{hk-i)fi{2i^kf? 

oo 

l + \\fT'T.[^hl-l)fl + e^hl]{2.kf 
k=i 



k=i 

i+{i+om\rr^RVM- 



□ 



Lemma 8. Let Assumptions A1-A3 and B be satisfied. Then Eg f[{9AD — 
r)Hj,,]<CeHog\e-^). 

Proof. Since no confusion is possible, we omit the subscripts 0, / of 
the expectation. Subtracting (43) from (42) we obtain 



7 AD 



f)E[L] 



^AD)(ii(^) - E[Li(0)]) -he- 9ADfL2{u;) = 0. 



Note that E[Li{9)] = EkhkC^^kffk ^ i'^^)^P and that (^ad - Of < 
Cge^log(e~^) on Using these facts and Lemma 6 we get 

E[(^AD - ffU,] 

< (E[Li(0)])-2(2E[(0 - ^AD)'(^i(e) - B[Li{6)]fu,] 



9ad)'^sup\L2{uj)\Ha, 



<C.nog^(.-^). 



□ 



Now Assumption Bl and the fact that hi = 1 yield, for e small enough, 

oo . .2 

>e^5](27rA;)2/i| >pie2fmax/ifc(27rA:)j log^(e-2) 



k=l 



>pi(2vr)Vlog^(e-^), 

which implies that Ee)j[(^AD -T)^/^(/)ly4i] = o(i?^[/, h]) uniformly in / G F 
and in 9 £ Q, as e ^ 0. This result together with (39), (40) and Lemma 7 
completes the proof of Theorem 1. 
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8. Proof of Theorem 2. Before proceeding to the proof of Theorem 2 we 
give some prehminary results. 

8.1. An auxiliary Bayesian problem. We consider a model with two ob- 
servations that will be used as a building block for the subsequent proofs. 
Set 

x = fo cos{2TTk9) + e^, x* = fo sin(27r/c6') + eC , 

where are independent M{0, 1) random variables and /o is an J\f{f,a'^) 
random variable that does not depend on (^,^*), with / S R, > 0. Here 
is a parameter to be estimated based on the observations x,x* and k is an 
integer. Define the Fisher information 

2n 



d 
de 



logp0{x,x*) 



where pg{x,x*) is the probability density of the observations. 

Lemma 9. We have J^{e) = e'^iP + ^^){27rkf, for any keZ. 

Proof. Denoting by C multiplicative constants that do not depend on 
6, we have 

exp< ~ ~ f cos{2TTk9) — ncos(27rA;0)]^ 



1 

2^2 



[x* -fsm{2Trke) - usm{2TTke)f } du 



X, 



Cexp<{ ^ [x cos(27rA;6l) +x*sin(27r /c6')]2 



+ (1 - \)f[x cos{2TTke) + X* sin(27rfce)] 

A r 1 - A 2 

xcos{2-nke) + X* sin(27r/c6') + — — / 

A 



where A = a"^ /{e^ + cr^). Hence writing fo = f + rja where r] ~ J\f{0, 1) and r] 
is independent of one obtains 



A d 
2^de 



= i2Trk)h-^X^-E 



X cos(27rA;i9) + x* sm{2TTke) + — — / 

A 



f + W + ^C cos{2M) + eC sin(27rA;0) + 

X {-eisin{2TTke) + eC cos{2^ke)y 
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+ |(r^ - C^) sin(47rfe^) + eH* cosiA-nkO) 
{27:kfE"^\^[X~^P + a^ + e\ 



□ 



8.2. Lower hounds for Bayes risks. In this subsection we consider the 
sequence model (17) where we suppose that the /^'s are no longer fixed 
values but independent random variables distributed as M{fk-,o'\) with some 
(Tfc > 0. By convention, cjfc = means that the corresponding fk is equal to 
fk almost surely. We assume in what follows that o"/; > only for a finite 
(and possibly depending on e) number of indices k. We also assume that the 
random sequence (/fc, A; = 1, 2, ... ) does not depend on the noises (^fc, Cfci ^ = 
1,2,...). We will refer to this model as the Bayes model with fixed 9. Let 
^a{df) denote the probability distribution of / = {/fc} G £2 in this model. 

Along with this, we will consider the full Bayes model defined in the 
same way, except that in this new model 6 is supposed to be a random 
variable having a density vr(x), x £@, that vanishes at the endpoints of the 
interval and has finite Fisher information In = J (7r'(x))^7r~^(x) dx. It will 
be assumed that 6 is independent of {fk,(,k,S,k^ fc = 1,2,...). 

We denote by E the expectation with respect to the joint distribution of 
(xfc, j;^, k= 1,2,...) and 9 in the full Bayes model and by Kq the expectation 
w.r.t. the distribution of (x^, x^. A: = 1, 2, ... ) in the Bayes model with fixed 



Lemma 10. Assume that the density ^{x) vanishes at the endpoints of 
the interval Q and has finite Fisher information 1^ ■ Then 



9. Define 



Xk 




.2 ' 



A; = 1,2,.... 



(44) 




k=l 



where 




00 



k=l 



The proofs of this and subsequent lemmas are given in the Appendix. 
In the next subsection we will show that one can choose the sequence 
{(Tk} so that the right-hand side of (44) coincides asymptotically with the 
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lower bound of Theorem 2. However, the left-hand side of (44) is different 
from that of (32). One difference is that in (32) the risk is normalized by 
the Fisher information I^{f), while in (44) we have its average w.r.t. the 
distribution ^'^(d/). The next lemma shows that /^(/) is sufficiently close 
to in particular, its variance is small enough. 

Lemma 11. // the al.^s are such that 

oo / oo \ 

(45) E(2^^)'^^ + s^P^' = oie^J2('^7rkfXk , 

k=i ^ \ k=i / 

then 

Lemma 12. Assume that the density ^{x) vanishes at the endpoints of 
the interval and has finite Fisher information /jr = /(7r'(x))^7r~^(a;) dx. 
Then, for any f £ F, 

Proof of this lemma is omitted: this is the standard Van Trees inequality 
for the problem of estimation of 6 with fixed / in model (1) ([33]; see also 
[7]). 

Lemma 13. // the sequence {cr^} satisfies relation (45) and 6 < \/ p/2, 
then 



2 



(46) ai 



8.3. From Bayes to minimax hounds. The main idea of the proof of The- 
orem 2 is to bound from below the minimax risk by a suitably chosen Bayes 
risk. In the rest of this section we consider the full Bayes model defined in 
Section 8.2 with a special choice of the cj^'s. Namely, we set 

0, k<-ieWe, 

where is a solution of (26), s| is defined by (28) and 7^ = l/log(e~^) 
(here and later we suppose that e is small enough, so that 7e < 1). To derive 
the minimax lower bound of Theorem 2 from the Bayes bounds of Section 
8.2 we need first to show that with a probability close to 1 the Gaussian 
random sequence {f^} belongs to the set Fs{f). In fact, the following result 
holds. 
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Lemma 14. For any 5^ > e^We-f^'^'^ we have P{/ ^ Fs{f)} < e-^^"^- . 

8.4. Proof of Theorem 2. Recall that we consider the full Bayes model 
with the cr|'s chosen according to (46) and Afc = cr^/(e^ + cr^)- Note that in 
this case 

oo 

(47) e^"^{27rkfXk = r'{l + o{l)) as e ^ 0. 

fe=i 

Indeed, (25) and (28) imply that \Xk/qk — 1| < 7e for > 'jeWg, and hence 
[cf. (29)] 

oo 

e2^(27rfc)2A;, = (l + o(l))e2 ^ i27rkfqk 

k=l k>"fsWe 



{l + o{l))(r^-e' E i'^^kfqX 



Here [cf. (30)] 



1 



< Ce^W'^ r\x^ + x^+i) dx < C-ile^Wl = o(r^), 

JO 

and thus (47) follows. 

Next, we check that if the cr^'s are chosen according to (46), then condition 
(45) is satisfied, so that one can apply Lemmas 10-13. In fact, (27) yields 
We X £-2/(2/5+1) with /? > 1, and using (47), (28) and (30) we get, as e ^ 0, 

oo 

e2^(27rA:)2AfcXe2T^|^0, 

k=l 

oo oo 
k=l ^eWe<k<We 

supai<.^,i-/^ = o(^e2f:(2vrfc)2A.). 

Now we start the main body of the proof of Theorem 2. First note that, 
in a standard way, conditioning on (x^, j;^. A; = 1, 2, ... ) and using Jensen's 
inequality, one can easily show that it is sufficient to prove the lower bound of 
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Theorem 2 for estimators 9^ depending on only via {xk, x^, k = 1,2, . . .). 
Let Te denote the set of all estimators 9s of 9 measurable with respect to 
{xk,x*j^, k = 1,2, . . .) and satisfying the inequalities 

(48) sup supEe,/[(4-^)2r(/)]<l + ^ and |^,| < 1. 

It is enough to restrict our attention to the estimators from 7^, since for 
estimators that do not satisfy one of the inequalities in (48) the lower bound 
of Theorem 2 is evident. 
Clearly, 

r - 2r^ 

(49) sup / ^gj[{9,-9fr{f)]TT{9)d9<l + ^^ V^.gT,. 

f&F,^{I)J^ 11/11 
We have 

inf sup Y.ej[{9-9fP{f)] 

> inf E[(^-e)2r(/)lp,^(^-)(/)] 

(50) > mf E[(^ - 9fM^^^f-^{f)] - sup E[{e - 9f{P - ^ (/))1^,(/)] 



\(a Q\2tTe jet fW-n 



> inf E[(^ - 9fP] - o{e^) - sup E[(^ - 9)\P - I^im^Mf)] 

> inf E[(^ - 9)^P] - o{e^) - sup E[(^ - 9)\P - 1%/))!^^^^)], 

9 ee% 



where we have used the inequality 

?(/) 



sup E[(^ - efPlp..f){f )] < Ce-^eM-C^^Ws) = o{e^). 



which is a direct consequence of the estimates \9\ < 1, \9\ < 1/4, P < Ce~^, 
relation (27) and Lemma 14. The last term in (50) can be represented as 

E[{9 - 9)\P - I%f))lp^^j^{f)] 

'''' -LJ'-w)h^'^^ 

+ E{[i9 - 9fp{f) - i](i - r/r (/))i^^(^-)(/)). 

Due to Lemmas 13 and 14, the second term on the right-hand side of (51) is 
asymptotically negligible with respect to X]fc(27'"fc)^Afc = r^(l + 0(1)) [cf. 
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(47)]. To evaluate the first term, note that 
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Eine - eypif) - i](i - /7^^(/))1f.(/)(/)) 



(52) < sup 



< Ce sup 



Eej[{e-efP{f)-l]7T{d)d9 



E(|l-/7/7/)|) 



Bej[{e-efp{f)]7r{e)de-i 







[E(r-r(/)) 



211/2 



It follows from (58) and Lemma 11 that e^E(F - 17/))^ is o(l). Now, 
Lemma 12, inequality (49) and the fact that s\iy> j^p^^^j^ I-,^ / F {f ) < Ce^ = 
o{r^) [cf. (58)] imply that 



(53) sup 







Eej[{0e-eYP{f)]7:{e)de-i 



Plugging (51)-(53) in (50) and using Lemmas 10, 13 and 14, we get 
inf sup Eej[(^ - efl^f)] > mfE[{e - OfP] + o(r") 

-I oo 

k=i 

where for the last equality we have used (47) and the fact that, due to (28) 
and (46), 



\e'P-\\m 



<Ce'W^jl~'' = o{l), 



as e — > 0. 



9. Proof of Theorem 3. It is enough to check that Assumptions B and 
C are satisfied for hk = and that (34) holds. We first check Assumption 
C. Recall that we supposed w.l.o.g. that 7£ < 1. Then 1 — > 7^ ~^ for 

k > "jeWe, and we have 



oo 

Y^{l-Xt){2nkff,= E (l-AD(2vrfe)2/| 

k = l k>'yeWe 
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oo 
k=l 

This and (30) show that Assumption C is satisfied for hk = X%. Using (27) 
we find that Assumption B also holds. Indeed, Assumption B2 amounts 
to checking that e^W^ < Ci, which is clearly the case for (3 > 1, whereas 
Assumption Bl follows from the relation -v/TF^/ log^(e~^) — > +oo, as e — > 0. 
Now we are ready to check (34). For any k £ [0,1] one obtains [recall that 
the QkS are defined by (28)] 



snp _R'[f,X*]< sup Y.i^-\lf{27rkf{fk + Vk) 



2 

k) 



(54) 

cy OO 

+ e' a-ql){27rkf, 
where for the last inequality we have used (29). Prom (36) we obtain 

OO 

k=l ky'yeWe 

Note also that due to the relations > Ce^Wf [ci. (30)] and 7^ ^ we get 
Y {ql-m.kf <2e' E {^T\2^kf <Ce'lt'w! = o[r^). 

These inequalities and (54) prove (34), since k can be arbitrarily small. 

APPENDIX 



Proof of Lemma 10. We start by applying the Van Trees inequality 
([33]; see also [7]): 

(55) inf E[(^, - 9f] >( [ J%9)tt{0) dO + 

where J^{0) is the Fisher information on 6 contained in the observations 
(xfc, x^, /c = 1, 2, . . . ) for the Bayes model with fixed 9. Since these observa- 
tions are independent, J'^{0) is the sum over k of the Fisher information of 
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pairs {xk,xl). So using Lemma 9 we get that J^{9) does not depend on 6 
and equals 

oo oo 



k=l 



k=l 



Therefore 

(56) P(^j^j^e) Trie) d9 + I^^ \i + l.p^i27rkf\,-^. 

To complete the proof, it is enough to remark that, in view of (24), 

(57) P>e~^\\ff>£~\27rfp. 



□ 



Proof of Lemma 11. Using the independence of the fk's for different 
values of fc, we get 

oo 



k=l 
oo 



k=l 



< 4 



\\frsnpal + Y^{2nk)'4 



k=l 



The assertion of the lemma follows now from (45). □ 

Proof of Lemma 13. First note that using Assumption A2 one obtains 

(58) e^Pif) > {27rffl > {2n)\ff - 5^) > {27rfp/2, 
for any / G F&{f) and for any 5 < \J p/2. Furthermore, by (24), 

(59) e^P{f)<2{\\f'f + L)<2{C^ + L) yf^Fsif). 
The elementary identity 1 — y = — 1 — y(l — y"^)^ yields 

F \. r (FU) 

'FsU) ' 



Hf) 



+ 



F 
F{f) 



FsU) 



r 



F{f) 



^Adf). 



To estimate the first integral on the right-hand side we note that F 
J F{f) X ^„{df); therefore using (57) and (59) we get 



Fsif) 



F{f) 



U^Adf) 



Fs'if) 



< CPU ^ Fsif)), 
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where F^{f)=f \ Fs{f). Finally, due to (57) and (58), 



(60) 



The rest follows from Lemma 11. □ 

Proof of Lemma 14. Let r]/: be i.i.d. AA(0, 1) random variables. We 
have 

+ p{ E i2nkr4sl>-r^]. 
We use Lemma 2 in order to evaluate the second probability. Note that 

and maxfc sl{2-Kkf^ < Ce^Wl^. Therefore, by Lemma 2, for any x < C^fWl 
we have 

Applying this inequality for X = 7^L/e2w|^+^/^ [note that in view of (27) x 
is less than C y/W^ ] , and using the fact that J2k>-feWei'^'^^)'^^ — 
Ce'^W^^jl~f^ = o(7e), one obtains 

p( Y (2-^«4>7T^|<p| Y i2.krivl-i)sl>^] 

The first probability on the right-hand side of (60) can be estimated simi- 
larly. We have Y.k>'y,We - Ce'^We%'^'^^'^ and maxfc>^iye < Ce^77^+^ 
Hence, by Lemma 2, for any x < CV^^p^, 

P( Y (4-l)sl>xe'VW,j-'^^-'/A<eM-Cx'). 
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So with X = CyJ^^We, noting that I]fc>7eH/£ 4 — ^^^ele ^' obtains 
P| E rjlsl>6A=p{ E {vl-l)sl>6'- E 4) 

<exp(-C7eW^e). □ 
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